Superspeculative Microarchitecture for Beyond AD 2000

نویسندگان

Mikko H. Lipasti

John Paul Shen

چکیده

I n its brief lifetime of 26 years, the microprocessor has achieved a total performance growth of 10,000 times thanks to technology improvements and microarchitecture innovations. Transistor count and clock frequency have increased by an order of magnitude in each of the first two decades of microprocessors; transistor count increased from 10,000 to 100,000 in the 1970s and up to 1 million in the 1980s, while clock frequency increased from 200 KHz to 2 MHz in the 1970s and up to 20 MHz in the 1980s. This incredible technology trend has continued: Since 1990, both transistor count and clock frequency have already achieved an increase of 20 to 30 times. During the 1980s, sustained instructions per cycle also increased by almost an order of magnitude, from roughly 0.1 to 0.9. IPC is a measure of the instruction-level parallelism or instruction throughput achieved by the concurrent processing of multiple machine instructions. In the 1990s, IPC improvement is struggling and may not triple by 1999. New microarchitecture innovations are needed. Current top-of-the-line microprocessors are four-instruction-wide superscalar machines; that is, they can fetch and complete up to four instructions in a single machine cycle. Such machines use pipelined functional units, aggressive branch prediction, dynamic register renaming, and out-of-order execution of instructions to maximize parallelism and tolerate memory latency. State-of-the-art processors include the Digital Equipment Alpha 21264, Silicon Graphics MIPS/R10000, IBM/Motorola PowerPC 604, and Intel Pentium Pro. Even with such elaborate microarchitectures, against a potential 4 IPC, these machines typically achieve only about 0.5 to 1.5 sustained IPC for real-world programs. Worse yet, most studies indicate that machine efficiency drops even lower as we extrapolate to wider machines. One recent study indicated that although a hypothetical 2-instruction-wide machine achieves IPC in the range of 0.65 to 1.40, a similar, hypothetical, 6-instruction-wide machine will achieve only 1.2 to 2.3 IPC. 1 Such data imply that the current superscalar paradigm is running into rapidly diminishing returns on performance. Future billion-transistor chips will inevitably implement machines that are much wider (issue more than four instructions at once) and deeper (have longer pipelines). The question is, how do we harvest additional parallelism proportional to increased machine resources? Several approaches have vocal advocates, each with valid reasons; they are • reconfigurable parallel computing engines; • specialized, very long instruction word (VLIW) machines; • wide, simultaneous multithreaded (SMT) uni-processors; • single-chip multiprocessors (CMP); • memory-centric computing engines (such as IRAM); …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey of new research directions in microprocessors

Current microprocessors utilise the instruction-level parallelism by a deep processor pipeline and the superscalar instruction issue technique. VLSI technology offers several solutions for aggressive exploitation of the instruction-level parallelism in future generations of microprocessors. Technological advances will replace the gate delay by on-chip wire delay as the main obstacle to increase...

متن کامل

Execution Performance of the Scheduled Dataflow Architecture (SDF)

This paper presents an evaluation of a nonblocking, decoupled memory/execution, multithreaded architecture known as the Scheduled Dataflow (SDF). Recent focus in the field of new processor architectures is mainly on VLIW (e.g. IA-64), superscalar and superspeculative designs. This trend allows for better performance at the expense of increased hardware complexity, and possibly higher power expe...

متن کامل

Hydrodynamic Models for Heavy-Ion Collisions, and beyond

A generic property of a first-order phase transition in equilibrium, and in the limit of large entropy per unit of conserved charge, is the smallness of the isentropic speed of sound in the “mixed phase”. A specific prediction is that this should lead to a non-isotropic momentum distribution of nucleons in the reaction plane (for energies ∼ 40A GeV in our model calculation). On the other hand, ...

متن کامل

TEAPC: Adaptive Computing and Underclocking in a Real PC

TEAPC is an IBM/Intel-standard PC realization of the TEAtime performance “maximizing” adaptive computing algorithm, giving performance beyond worstcase-specifications. TEAPC goes beyond the TEAtime algorithm by adapting to the current CPU load. It is also the first machine to use extensive underclocking for disaster tolerance, low power consumption and high reliability. This is all done dynamic...

متن کامل

Packings and Approximate Packings of Spheres

Close-packings of uniformly-sized spheres with centres on various lattices are described, with volume fractions equal or close to the maximum possible = p 18 (this value has long been `known' via Kepler's conjecture, and has been proved). Regular packings with two or three sized spheres can push this volume fraction to beyond 80%. The bulk of the paper studies irregular `packings' of a large sp...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IEEE Computer

دوره 30 شماره

صفحات -

تاریخ انتشار 1997

Superspeculative Microarchitecture for Beyond AD 2000

نویسندگان

چکیده

منابع مشابه

A survey of new research directions in microprocessors

Execution Performance of the Scheduled Dataflow Architecture (SDF)

Hydrodynamic Models for Heavy-Ion Collisions, and beyond

TEAPC: Adaptive Computing and Underclocking in a Real PC

Packings and Approximate Packings of Spheres

عنوان ژورنال:

اشتراک گذاری